maximization step
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)
Model Adaptation: Historical Contrastive Learning for Unsupervised Domain Adaptation without Source Data Supplemental Materials Anonymous Author(s) Affiliation Address email
Maximum likelihood (ML) is a concept to describe the theoretic insights of clustering algorithms. As it is not easy to optimize Eq.14 directly, we employ a surrogate function to lower-bound the Please note the notation " t m" shows that the k is encoded by a historical encoder. In practice, we achieve Eq. 9 by minimizing a historical contrastive instance discrimination loss: Please note that Eq. 10 is an instance of Eq. 9. The two equations look different due to: 1) Eq. 10 Proposition 2 The HCID is convergent under certain conditions. We have illustrated in Section A.1 that the inequality in Eq.11 holds with equality if One possible way to achieve Eq. 13 is to conduct gradient descent by minimizing the historical Different from the classical expectation maximization (mentioned in Section A.1) that consists It can be observed that Eq. 16 is the same as Eq. 15 except involving an extra weighting element In the following, we show the optimization of Eq. 16 is a CEM process.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)
Partially Observable Contextual Bandits with Linear Payoffs
Zeng, Sihan, Bhatt, Sujay, Koppel, Alec, Ganesh, Sumitra
We study contextual bandits where the context is not fully observable, a setting that significantly departs from the classic literature. Consider the problem of making trading decisions where the arms correspond to different algorithmic trading strategies and the reward is the monetary gain. The reward is a function of the evolving market condition (context) with potential influences from various exogenous factors like Twitter feeds, secondary market behaviour, local trends, etc, of which only a small subset can be directly observed. Large institutional investors may spend additional resources on tracking other relevant features that reveal information on the true underlying context. The goal is to quickly identify and play the best strategy that maximizes the cumulative gain from trading. Motivated by problems of this nature, we introduce and study a partially observable linear contextual bandit framework, where a decision maker interacts with an environment over T rounds.
EM Pre-training for Multi-party Dialogue Response Generation
Dialogue response generation requires an agent to generate a response according to the current dialogue history, in terms of which two-party dialogues have been well studied, but leaving a great gap for multi-party dialogues at the same time. Different from two-party dialogues where each response is a direct reply to its previous utterance, the addressee of a response utterance should be specified before it is generated in the multi-party scenario. Thanks to the huge amount of two-party conversational data, various pre-trained language models for two-party dialogue response generation have been proposed. However, due to the lack of annotated addressee labels in multi-party dialogue datasets, it is hard to use them to pre-train a response generation model for multi-party dialogues. To tackle this obstacle, we propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels, and the maximization steps to optimize a response generation model. Theoretical analyses and extensive experiments have justified the feasibility and effectiveness of our proposed method.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- (10 more...)
Learning Nonlinear Dynamical Systems Using an EM Algorithm
The Expectation-Maximization (EM) algorithm is an iterative pro(cid:173) cedure for maximum likelihood parameter estimation from data sets with missing or hidden variables [2]. It has been applied to system identification in linear stochastic state-space models, where the state variables are hidden from the observer and both the state and the parameters of the model have to be estimated simulta(cid:173) neously [9]. We present a generalization of the EM algorithm for parameter estimation in nonlinear dynamical systems. The "expec(cid:173) tation" step makes use of Extended Kalman Smoothing to estimate the state, while the "maximization" step re-estimates the parame(cid:173) ters using these uncertain state estimates. In general, the nonlinear maximization step is difficult because it requires integrating out the uncertainty in the states.
Half-Space and Box Constraints as NUV Priors: First Results
Keusch, Raphael, Loeliger, Hans-Andrea
Normals with unknown variance (NUV) can represent many useful priors and blend well with Gaussian models and message passing algorithms. NUV representations of sparsifying priors have long been known, and NUV representations of binary (and M-level) priors have been proposed very recently. In this document, we propose NUV representations of half-space constraints and box constraints, which allows to add such constraints to any linear Gaussian model with any of the previously known NUV priors without affecting the computational tractability.
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
Gaussian Mixture Estimation from Weighted Samples
Frisch, Daniel, Hanebeck, Uwe D.
Given a set of samples, the parameters of a GM are determined in such a way as to best fit the samples in a maximum likelihood way. Solutions for equally weighted samples are readily available, expectation-maximization (EM) based methods being the most prevalent because of low computational requirements and ease of implementation. So it comes as a surprise that GM estimation for weighted samples is hard to find in literature. It might be even more surprising that the standard reference [1] gives incorrect results, see Figure 1. 2. Context Applications for sample-to-density function approximation include clustering of unlabled data [2, 3], multi-target tracking [4, 5], group tracking [6], multilateration [7, 8], and arbitrary density representation in nonlinear filters [9, 10]. A popular basic solution to this is the k-means algorithm. It does not find a complete density representation, only the means of the individual clusters. The k-means algorithm uses hard sample-tomean associations, therefore yields merely approximate solutions but can be computationally optimized using k-d trees [11, 12]. Moreover, the global optimum can be found deterministically [13], therefore it can be used to provide an initial guess for more elaborate algorithms. A sample-to-density approximation that is optimal in a maximum likelihood sense can be searched with numerical optimization techniques such as the Newton algorithm that has quadratic convergence but high computational demand per iteration, quasi-Newton methods, the method of scoring, or the conjugate gradient method with slower convergence but less computational effort per iteration [14].
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
A Primer on the EM Algorithm
The Expectation-Maximization (EM) algorithm is one of the main algorithms in machine learning for estimation of model parameters [2][3][4]. For example, it is used to estimate mixing coefficients, means, and covariances in mixture models as shown in Figure 1. Its objective is to maximize the likelihood p(X θ) where X is a matrix of observed data and θ is a vector of model parameters. This is maximum likelihood estimation and in practice the log-likelihood ln p(X θ) is maximized. The model parameters that maximize this function are deemed to be the correct model parameters.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.79)
Deep imagination is a close to optimal policy for planning in large decision trees under limited resources
Moreno-Bote, Ruben, Mastrogiuseppe, Chiara
Many decisions involve choosing an uncertain course of actions in deep and wide decision trees, as when we plan to visit an exotic country for vacation. In these cases, exhaustive search for the best sequence of actions is not tractable due to the large number of possibilities and limited time or computational resources available to make the decision. Therefore, planning agents need to balance breadth (exploring many actions at each level of the tree) and depth (exploring many levels in the tree) to allocate optimally their finite search capacity. We provide efficient analytical solutions and numerical analysis to the problem of allocating finite sampling capacity in one shot to large decision trees. We find that in general the optimal policy is to allocate few samples per level so that deep levels can be reached, thus favoring depth over breadth search. In contrast, in poor environments and at low capacity, it is best to broadly sample branches at the cost of not sampling deeply, although this policy is marginally better than deep allocations. Our results provide a theoretical foundation for the optimality of deep imagination for planning and show that it is a generally valid heuristic that could have evolved from the finite constraints of cognitive systems.
- Europe > Spain (0.28)
- North America > United States (0.28)
- Education > Curriculum > Subject-Specific Education (0.67)
- Health & Medicine > Therapeutic Area (0.46)
Resilient Monotone Sequential Maximization
Tzoumas, Vasileios, Jadbabaie, Ali, Pappas, George J.
Applications in machine learning, optimization, and control require the sequential selection of a few system elements, such as sensors, data, or actuators, to optimize the system performance across multiple time steps. However, in failure-prone and adversarial environments, sensors get attacked, data get deleted, and actuators fail. Thence, traditional sequential design paradigms become insufficient and, in contrast, resilient sequential designs that adapt against system-wide attacks, deletions, or failures become important. In general, resilient sequential design problems are computationally hard. Also, even though they often involve objective functions that are monotone and (possibly) submodular, no scalable approximation algorithms are known for their solution. In this paper, we provide the first scalable algorithm, that achieves the following characteristics: system-wide resiliency, i.e., the algorithm is valid for any number of denial-of-service attacks, deletions, or failures; adaptiveness, i.e., at each time step, the algorithm selects system elements based on the history of inflicted attacks, deletions, or failures; and provable approximation performance, i.e., the algorithm guarantees for monotone objective functions a solution close to the optimal. We quantify the algorithm's approximation performance using a notion of curvature for monotone (not necessarily submodular) set functions. Finally, we support our theoretical analyses with simulated experiments, by considering a control-aware sensor scheduling scenario, namely, sensing-constrained robot navigation.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)